Song Popularity with SpotifyAPI

An analysis of song popularity through Spotify-generated characteristics.

Hannah Ahn

Introduction

What makes a song… popular? Trying to evaluate that empirically is difficult - there are so many variables when it comes to creating a song; melody, rhythm/tempo, lyrics, structure, message, and dynamics are just a few that arise in my mind, and each variable has within it more variation. That is to say that, combinations of a multitude of things contribute to a song’s popularity. While I can’t answer the question “what makes a song popular?” within the scope of this project, I can evaluate the relationship between three musical features (danceability, valence, and speechiness) and popularity across different genres. For every track on their platform, Spotify provides data for thirteen “audio features.” I’ve chosen to focus on three; danceability, valence, and speechiness. Spotify defines them as follows:

Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity.

Valence: Describes the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

Speechiness: This detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.

In this study I evaluate six different genres of music (pop, edm, rap, r&b, latin, and rock) with these three features, and hypothesize that each can influence a song’s popularity within each genres. Understanding these relationships is useful for not only the music industry but also music researchers, as it can inform musicians, producers, and playlist curators about the elements that contribute to a song’s popularity. This analysis contributes to the broader understanding of how musical preferences vary across genres and the impact of specific features on a song’s overall appeal.

Data

I initially tried to use SpotifyAPI through the spotifyr package directly, but kept getting various server errors. Instead, I decided to use someone else’s dataset on kaggle, that was created using SpotifyAPI. The dataset includes almost 30,000 songs from Spotify of various genres and national origin. Spotify records and makes available the following:

track_id track_name
track_artist
track_popularity
track_album_id
track_album_name
track_album_release_date
playlist_name
playlist_id
playlist_genre
playlist_subgenre
danceability
energy
key loudness
mode
speechiness acousticness
instrumentalness
liveness
valence
tempo
duration_ms

The independent variables in this study are the six genres (pop, rap, rock, r&b, latin, and edm) and three musical attributes (danceability, valence, and speechiness). These are percentages from 0 - 100%, recorded as decimals to three places, and are created by Spotify. While Spotify’s exact specifications for what constitutes each attribute is opaque, it seems as though they take into account tempo, dynamics, key, and other variables for danceability and valence, while speechiness is an evaluation of the ratio of words to music.

The dependent variable is song popularity, which is measured by Spotify. It is a popularity index that ranks how popular an artist is relative to other artists on Spotify, from 0 - 100. Increasing this score can increase a song’s algorithmic reach (popping up on people’s recommended and Spotify-generated playlists). According to Spotify, the popularity index is influenced by recent stream count, save rate, number of playlists, skip rate, and share rate.

This study is cross-sectionally designed; the data was collected at a single point in time using SpotifyAPI and examines the relationships between variables as they exist at a particular moment. Data is observational and does not fall under before-after or diff in diff study design.

My primary hypothesis posits that certain musical attributes can predict a track’s popularity within distinct genres; In this study I evaluate six different genres of music (pop, edm, rap, r&b, latin, and rock) with these three features. My primary hypothesis posits that certain musical attributes can predict a track’s popularity within distinct genres; I hypothesize that songs with higher danceability will be more popular in genres like pop, edm, latin, and rap, but be less popular in r&b and rock, and that the same relationship exists with valence. I also hypothesize that songs with higher speechiness will be less popular in genres like pop and edm, but more popular in the remaining genres.

I believe these hypotheses might be true because genres like edm, pop, latin, and rap often prioritize energetic and uplifting music that encourages movement. Higher danceability and valence align with the upbeat and lively nature of pop, edm, latin, and rap, making such songs more likely to be embraced by listeners seeking an energetic experience. R&b and rock genres often feature a broader range of emotional expression, including slower tempos and more complex arrangements. Higher danceability and valence may be perceived as less characteristic of the emotional depth and complexity associated with these genres, potentially making them less popular among audiences seeking a different musical experience.

With regards to speechiness, pop and edm genres often emphasize catchy melodies and instrumentals, with vocals playing a crucial role. Higher speechiness, indicating a greater presence of vocal elements, might be perceived as less favorable in genres where instrumental and electronic elements are more prominent. In rap, r&b, latin, and rock genres, lyrics and vocal expression often play a more central role. Higher speechiness may be associated with lyrical depth and storytelling, enhancing the appeal of songs in these genres. Additionally, genres like rap often rely on clear and impactful vocal delivery, making speechiness a potentially positive factor in popularity.

My hypotheses would be supported by the following: observing a positive relationship between danceability/valence and song popularity in genres pop, edm, latin, and rap, observing a negative relationship between danceability/valence and song popularity in genres r&b and rock, and observing a negative relationship between speechiness and song popularity in genres pop and edm. If I find the opposite to be true of these proposed relationships, those findings would provide evidence against my hypotheses.

The bar plot below illustrates the distribution of the dependent variable, popularity. As you can see, there are a fair amount of songs that have a low score - while it may seem like the data is skewed, 50 is considered a “good” score by Spotify. A score of 50 will generally get picked up by an algorithm and recommended to users. I would consider the distribution of popularity and sample of songs to be rather reasonable and not very biased.

# Creating a bar plot that shows distribution of popularity

popularity_bar <- ggplot(songs, aes(x = track_popularity)) +
  geom_bar() +
  labs(
    title = "Popularity",
    x = "Popularity Score",
    y = "Number of Songs"
  ) +
  theme_few()

popularity_bar

Results

The following plots visualize the relationships outlined in the Data section.

# Creating scatter plots

# Scatter plot with regression line for Danceability vs Track Popularity
danceability_plot <- songs |> 
ggplot(songs, mapping = aes(
  x = danceability, 
  y = track_popularity, 
  fill = playlist_genre)) +
  geom_point(alpha = 0.1, stroke = NA) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Danceability vs Popularity", 
    x = "Danceability", 
    y = "Popularity", 
    fill = "Playlist Genre") +
  facet_wrap(~ playlist_genre) +
  theme_few() 

# Scatter plot with regression line for Valence vs Track Popularity
valence_plot <- ggplot(songs, mapping = aes(
  x = valence, 
  y = track_popularity, 
  fill = playlist_genre)) +
  geom_point(alpha = 0.1, stroke = NA) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Valence vs Popularity", 
    x = "Valence", 
    y = "Popularity", 
    fill = "Playlist Genre")+ 
    facet_wrap(~ playlist_genre) +
  theme_few()

# Scatter plot with regression line for Speechiness vs Track Popularity
speechiness_plot <- ggplot(songs, mapping = aes(x = speechiness, y = track_popularity, fill = playlist_genre)) +
  geom_point(alpha = 0.1, stroke = NA) +
  geom_smooth(method = "loess", se = FALSE, color = "black") +
  labs(
    title = "Speechiness vs Popularity", 
    x = "Speechiness", 
    y = "Popularity", 
    fill = "Playlist Genre") + 
    facet_wrap(~ playlist_genre) +
  theme_few()

# Showing Plots
ggplotly(danceability_plot)
ggplotly(valence_plot)
ggplotly(speechiness_plot)

I’ve used the loess regression here to capture some of the nuances in the data more before we begin to conduct a linear regression analysis. From these visualizations alone, we can observe the following.

The Danceability vs Popularity graphs show us that there may be a positive correlation between danceability and popularity for pop, rap, r&b, and rock.

The Valence vs Popularity graphs show us that there may be a positive correlation between valence and popularity for edm, and perhaps an ever so slight positive correlation for pop and rock. It also looks as though there may be a slight negative correlation for latin and rap, and a negative correlation for r&b.

The Speechiness vs Popularity graphs show that, generally, songs are not very “speechy.” This makes sense, considering that a song that is mostly words and no music would not be much of a song. According to the graphs, there may be a slightly positive correlation between speechiness and popularity for edm, r&b, and rock.

Now, let us more rigorously evaluate the relationships illustrated above. Below is a large regression table that details the coefficients and intercepts, along with p-values.

# Running regressions for each genre

# Get list of unique genres and create empty list to store
genres <- unique(songs$playlist_genre)
regression_models <- list()

# Loop through each genre, filter for current genre within unique genre list, create and apply formula, store
for (genre in genres) {
  genre_songs <- filter(songs, playlist_genre == genre)
  formula <- as.formula(paste("track_popularity ~ valence + danceability + speechiness"))
  regression_model <- lm(formula, data = genre_songs)
  regression_models[[genre]] <- regression_model
}

varnames <- c("(Intercept)" = "Intercept", 
              "valence" = "Valence", 
              "danceability" = "Danceability", 
              "speechiness" = "Speechiness")

# Create summary table
summary_table <- msummary(regression_models,
                          statistic = c("s.e. = {std.error}",
                                         "p = {p.value}"),
                           gof_map = c("nobs", "r.squared", "adj.r.squared"),
                          coef_map = varnames)

summary_table
pop rap rock latin r&b edm
Intercept 34.174 28.850 35.547 40.531 43.749 34.671
s.e. = 1.737 s.e. = 1.692 s.e. = 1.513 s.e. = 2.246 s.e. = 1.740 s.e. = 1.609
p = <0.001 p = <0.001 p = <0.001 p = <0.001 p = <0.001 p = <0.001
Valence −1.475 −5.409 −3.958 −3.616 −17.933 14.685
s.e. = 1.644 s.e. = 1.414 s.e. = 1.777 s.e. = 1.684 s.e. = 1.676 s.e. = 1.413
p = 0.370 p = <0.001 p = 0.026 p = 0.032 p = <0.001 p = <0.001
Danceability 19.136 27.125 15.402 10.148 10.260 −8.805
s.e. = 2.832 s.e. = 2.294 s.e. = 2.929 s.e. = 3.266 s.e. = 2.754 s.e. = 2.589
p = <0.001 p = <0.001 p = <0.001 p = 0.002 p = <0.001 p = <0.001
Speechiness 28.106 −12.091 5.034 14.100 1.070 0.540
s.e. = 4.973 s.e. = 2.323 s.e. = 7.857 s.e. = 4.044 s.e. = 3.266 s.e. = 4.182
p = <0.001 p = <0.001 p = 0.522 p = <0.001 p = 0.743 p = 0.897
Num.Obs. 5507 5746 4951 5155 5431 6043
R2 0.015 0.029 0.006 0.005 0.021 0.018
R2 Adj. 0.015 0.029 0.005 0.004 0.020 0.017

This table might be slightly difficult to parse at a glance. I’ll briefly summarize by genre.

The pop genre had a negative valence coefficient (-1.475) that was not statistically significant (p = 0.370). However, its danceability coefficient was positive (19.136), and statistically significant (p < 0.001). The speechiness coefficient was positive (28.106) and statistically significant (p < 0.001).

It is important to remember, here, how these scores are scaled. Each of the music features are a decimal value between 0 and 1, denoting a percentage. Therefore, it is impossible to have a one-point increase in valence, for example, if valence is already 0.5 or 50%. While the danceability coefficient above indicates that, for a one-unit increase in danceability, popularity will increase about 19 points, this is impossible to see in reality.

The rap genre had a negative valence coefficient (-5.409) that was statistically significant (p < 0.001). Its danceability coefficient was positive (27.125) and statistically significant (p < 0.001). Its speechiness coefficient was negative (-12.091) and statistically significant (p < 0.001).

The rock genre had a negative valence coefficient (-3.958) that was statistically significant (p = 0.026). It had a positive danceability coefficient (15.402) that was statistically significant (p < 0.001). It had a positive speechiness coefficient (5.034) that was not statistically significant (p = 0.522).

The latin genre had a negative valence coefficient (-3.616) that was statistically significant (p = 0.032). Danceability coefficient was positive (10.148) and statistically significant (p = 0.002). The speechiness coefficient was also positive (14.100) and

The r&b genre exhibited a negative valence coefficient (-17.933) that was statistically significant (p < 0.001). It demonstrated a positive danceability coefficient (10.260) that was statistically significant (p < 0.001). The speechiness coefficient (1.070) was positive but not statistically significant (p = 0.743).

In the edm genre, the valence coefficient (14.685) was statistically significant and positive (p < 0.001). The danceability coefficient (-8.805) was negative and statistically significant (p < 0.001). The speechiness coefficient (0.540) was positive but not statistically significant (p = 0.897).

I’ve neglected speaking to the intercepts as I do not see them as incredibly relevant to the analysis - the intercepts are the supposed popularity of a song within the genre if valence, danceability, and speechiness were all 0.

More generally, we can see that valence, overall, seems to be negatively associated with popularity. Danceability seems generally positively associated. Speechiness throws us some mixed results, with a surprisingly positive, statistically significant correlation with popularity within the genre of pop.

I would hesitate to interpret these as directly causal. There are a few reasons for this. One is the presence of potential confounding variables - less calculable elements like how memorable a melody is may contribute both to a measure of danceability and popularity, for example. The relationship between musical features and popularity could also be bidirectional - wile the analysis assumes that musical attributes influence popularity, it is plausible that song popularity influences how Spotify measures valence or danceability. However, the fact that there are some statistically significant relationships suggests that there exists a relationship between genre, music features, and popularity. Furthermore, all r-squared values are extremely small. This implies that the included features explain only a small portion of the variability in popularity.

Below is a regression that was run on the entire dataset, with no subsetting for genre.

# Overall regression for entire dataset
regressions <- lm(track_popularity ~ valence + danceability + speechiness, data = songs)

modelsummary::modelsummary(regressions,
                           statistic = c("s.e. = {std.error}",
                                         "p = {p.value}"),
                           gof_map = c("nobs", "r.squared", "adj.r.squared"),
                            coef_map = varnames)
 (1)
Intercept 34.974
s.e. = 0.645
p = <0.001
Valence 1.426
s.e. = 0.625
p = 0.023
Danceability 10.554
s.e. = 1.020
p = <0.001
Speechiness −1.277
s.e. = 1.381
p = 0.355
Num.Obs. 32833
R2 0.004
R2 Adj. 0.004

Here, we can see that valence and danceability both have some positive, statistically significant relationship with popularity, while speechiness is both negatively-related and not statistically significant. We again observe very small r-squared values.

Conclusion

The regression table and overall regression both align with my primary hypothesis, which posited that specific musical attributes were related to a track’s popularity within distinct genres.

With regards to my more specific hypotheses, we find mixed results. Generally, my findings support the hypotheses, revealing genre-specific patterns in the associations between these musical attributes and popularity. For instance, as predicted, genres like pop, EDM, Latin, and rap, exhibited positive relationships between danceability and popularity, validating the hypothesized relationship. However, the largely negative relationship between valence and popularity resist my initial instinct that valence would have a positive relationship within a majority of genres. Conversely, in r&b and rock, valence and danceability exhibit a negative correlation with popularity, consistent with the initial hypotheses. The hypothesis regarding speechiness is also substantiated, as it is positively correlated with popularity in most genres, except pop and EDM where it shows a negative or non-significant association. The models, although explaining a modest proportion of popularity variability, provide valuable genre-specific insights into the relationship between music features and audience preferences. While causation cannot be definitively inferred as stated earlier, the observed patterns offer substantial evidence supporting the notion that distinct musical features contribute to the popularity of songs within specific genres. However, it’s crucial to acknowledge several limitations of this study. First, as mentioned, the models have relatively low R-squared values, meaning these models cannot account for a significant portion of variability within the datapoints. Additionally, the assumption of linearity may not fully capture the complex relationships between musical features and popularity - as we saw using the plotted loess models, perhaps nonlinear regressions may have been more appropriate. Threats to inference include the possibility of confounding variables not accounted for in the models. Addressing missing data and exploring additional potential confounders could enhance the robustness of the analysis. If more time and resources were available, conducting more sophisticated analyses, such as considering nonlinear relationships or exploring interactions between features/expanding the list of features, could further refine the understanding of the factors influencing popularity in different music genres.